SAGE Genie: a suite with panoramic view of gene expression.
نویسنده
چکیده
I f the achievement of complete sequencing of the one-dimensional genetic codes of the human genome can be compared with man landing on the moon, the interpretation of genomic instruction in a four-dimensional biological context, such as during development and diseases, will prove to be a much more challenging and daunting task than that of getting man back from the moon to the earth. One of the greatest mysteries of life has been how a fertilized egg, which contains all of the genetic information that defines a living organism, can give rise to so many different tissues, which organize into different organs, as it divides and differentiates. It is clear that to unravel life’s mysteries, we will have to rely, at least in large part, on tools that can allow us to determine when and where a gene is to be turned on or off in a cell as it divides, differentiates, and ages. Obviously, such tools are also important for the detection of when and where a seemingly precise interpretation of genomic instruction goes awry, which underlies many disease states such as cancer. Several technologies that show promises of high-throughput and potential for global analysis of gene expression were developed in the 1990s (1–3). However, the realization of these promises and potentials has been slow in coming, partly because of the lack of a unified standard for accurate data collection, analysis, and presentation for each methodology. As reported in a recent issue of PNAS (4), Boon and colleagues have made a major stride in this direction by developing a suite of bioinformatic tools that provides a single platform for compiling, annotating, and interpreting large sets of gene expression data collected by one of these technologies, serial analysis of gene expression, or more widely known as SAGE. SAGE technology, which was originally developed by Kinzler and Vogelsteins’ group at Johns Hopkins University (2), is a clever high-throughput 3 expressed sequence tag (EST) counting methodology. Unlike the original brute-force EST sequencing strategy, where cDNA clones were randomly picked from cDNA libraries, SAGE technology measures the level of gene expression based on the frequency of occurrence of the 3 signature SAGE tags of 10 bases unique to each transcript. Because of the minimal sequence information necessary to define an expressed gene, or messenger RNA (mRNA), many SAGE tags from different genes can be obtained and sequenced at a time, which greatly speeds up the EST counting process. The method has been used successfully and extensively in the past for comparison of gene expression between a pair of RNA samples to identify differentially expressed genes within a given biological context (5). Such horizontal comparisons mainly focus on SAGE tags corresponding to genes that are either upor down-regulated, whereas the bulk of the gene expression information, which took a great deal of effort to collect, often sits untapped. SAGE Genie is a logistically laid out suite of bioinformatic tools that allow automatic and reliable matches of SAGE tags to known gene transcripts. This process was accomplished first by filtering out experimentally obtained SAGE tags that had incorrect linker sequences, appeared only once, or were generated by sequencing errors, from millions of tags collected from over 100 different human cell types as part of the National Institutes of Health Cancer Genome Anatomy Project (CGAP). The resulting confident SAGE tags (CSTs) then were used to evaluate and match the virtual SAGE tags predicted from known mRNA transcript (cDNA) sequences of different publicly available databases, including full-length cDNAs or 3 ESTs. The virtual tags were divided into different groups based on the origin of the databases from which the tags were generated, the absence and presence of polyadenylation signals and poly(A) tails, and whether the tags represented differentially spliced or internal (non-3) transcript sequences. The match in percentage of virtual tags to CSTs allows ranking of available databases with known transcript sequences. Reciprocal cross-referencing between virtual tags and CSTs provides not only the best match of a CST to a known gene transcript sequence, but also confirmation that experimentally obtained SAGE tags indeed come from mostly 3 ends of mRNA transcripts. The resulting bioinformatic interface allows automatic tag-togene identification, measurement of gene expression normalized to the occurrence of a tag per 200,000 tags collected from a SAGE experiment, and the origins from which a tag is counted. Thus, SAGE Genie provides a computational platform on which not only more than two horizontal comparisons (e.g., normal brain versus brain tumors; Fig. 1), but also a nearly infinite number of vertical comparisons (e.g., different tissue or organ types) in gene expression at a global scale can be conducted. The data output can be presented with interfaces such as the Anatomic Viewer, Digital Northern, and Digital Gene Expression Display for any given SAGE tag or gene transcript of interest, thus providing a quick glance at when and where a gene may be expressed. With SAGE Genie, experimentally collected SAGE tags for each biological system can be continuously annotated and inputted into the growing number of unique CSTs. With increasing collections of both CSTs and virtual tags, SAGE Genie could prove to be a very powerful tool for archiving and analyzing the expression profile for any given gene under any biological context. In contrast, DNA microarray methodology (3), which has received much attention recently in the field of gene expression analysis, is still lacking a unified standard for
منابع مشابه
An anatomy of normal and malignant gene expression.
A gene's expression pattern provides clues to its role in normal physiology and disease. To provide quantitative expression levels on a genome-wide scale, the Cancer Genome Anatomy Project (CGAP) uses serial analysis of gene expression (SAGE). Over 5 million transcript tags from more than 100 human cell types have been assembled. To enhance the utility of this data, the CGAP SAGE project create...
متن کاملSerial analysis of gene expression (SAGE) in normal human trabecular meshwork
PURPOSE To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma. METHODS Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries wer...
متن کاملEnhanced Expression of Genes Involved in the Biosynthesis Pathway of Tanshinones in Tetraploid Plants of Salvia Officinalis L.
Extended Abstract Introduction and Objective: Polyploidy is one of the main factors in plant adaptation that can increase secondary metabolites production in plants. Salvia officinalis L. is a perennial plant from the Lamiaceae family with a long history of use in the medicinal industry. Tanshinones are crucial active compounds biosynthesized in Salvia. This study was aimed to analyze the expr...
متن کاملDetection of over Expression of Gene in Small Lung Carcinoma by Cigarette Smoking
Small lung carcinoma is an endemic disease throughout the world which occurs mainly by cigarette smoking. A mutagenous compound known as nitro benzene is formed in the body due to cigarette smoking which causes mutation in normal tissues of lung and hence they are changed into malignant tissues by over growth of the tissues. Cancer Genome Anatomy Project website developed by National Cancer Ins...
متن کاملA panoramic view of gene expression in the human kidney.
To gain a molecular understanding of kidney functions, we established a high-resolution map of gene expression patterns in the human kidney. The glomerulus and seven different nephron segments were isolated by microdissection from fresh tissue specimens, and their transcriptome was characterized by using the serial analysis of gene expression (SAGE) method. More than 400,000 mRNA SAGE tags were...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 99 18 شماره
صفحات -
تاریخ انتشار 2002